zxsnippetsfandomcom-20200214-history
Clearing screen
On ZX 48 screen lays on address 16384 and takes exactly 6912 bytes. First 6144 bytes are reserved for pixel information, next 768 are reserved for attributes. So by clearing screen we understand filling whole pixel area by zero (and possibly setting attributes). Simple and slow clearing of pixel area ld hl, 16384 ;pixels ld de, 16385 ;pixels + 1 ld bc, 6143 ;pixels area length - 1 ld (hl), 0 ;set first byte to '0' ldir ;copy bytes How it works ? We use LDIR to do most of work. LDIR instruction is designed to copy block of data and it does the whole thing by copying bytes one by one. After copying one byte it increases HL and DE by one and decreases BC by one. If BC is not equal to 0 the whole procces repeats. So we set source to the first byte pixel area and destination to the second byte of pixel area. LDIR takes value of first byte of source and copies it to the first byte of destination. Then continue to second byte of source. As second byte of source shares same address as first byte destination this byte was just setted. The only thing we have to do is set the very first byte of source to zero which we do by instruction ld (hl), 0. See diagram: Step HL DE BC memory memory memory 16384 16385 16386 before LDIR 16384 16385 6143 00 ?? ?? after 1.step 16385 16386 6142 00 00 ?? The another interesting thing is that we copy only 6143 bytes but pixel area has length of 6144 bytes. Let us see on the last step. Register BC keeps value 1. HL and DE were increased for every step we previously did. So just before last step HL points to adress 16384+6142 which is last but one byte of pixels and DE points to address 16385+6142 which is last byte of pixels. So the very last copy operation changes value in the last byte. If we copy 6144 bytes the last copy operation would rewrite first byte of attributes. Simple and slow clearing of screen ld a, 56 ;attributte ld hl, 16384 ;pixels ld de, 16385 ;pixels + 1 ld bc, 6144 ;pixel area ld (hl), l ;set first byte to '0' as HL = 16384 = $4000 therefore L = 0 ldir ;copy bytes ld bc, 767 ;attribute area length - 1 ld (hl), a ;set first byte to attribute value ldir ;copy bytes This routine is very similar to previous but there are some little differencies: We use ld (hl), l to set value of first byte in pixel area. Is that ok ? So what is actually in register L ? Previously we set HL to 16384 which is $4000 in hexadecimal form. From hexadecimal form we can see that lower byte of value 16384 is actually 0. So by using ld (hl), l we just set (hl) to zero. Ok, it is same but still... why to do such thing ? There are two reasons: ld (hl),l occupies just one byte in memory but ld (hl),0 occupies two bytes. And the second thing is that ld (hl),l is faster (4 T) than ld (hl),0 (7 T). In this case it does not matter because the next instruction is LDIR which takes 21 T and will be repeated by several thousand times. The next difference is that we copy 6144 bytes. Why this ? It is because we want reuse registers HL and DE for the next LDIR which sets attribute area. After the very last operation of first LDIR the HL points to the first byte of attribute area, DE points to the second byte of attribute area. Looks familiar to you ? Should be. The first byte of attribute area was overwritten but is not important beacuse we have to overwrite it anyway. Faster clearing of pixel area LDIR is work horse. It can do much work. And actually it is not so slow. It does many thing: #copies byte from (hl) to (de) #increases HL and DE and decreases BC #repeats until BC equals to 0 If you try to write an assembly code which mimics LDIR you will end with code which is bigger and slower than LDIR. So what to do ? We just cut number of operations to minimum: *we don't need to copy bytes, we just want set (hl) to some value *sometimes we don't need increase 16-bit register HL, for 255 cases of 256 we are good with increasing just 8-bit register L *we don't need to check byte counter for every byte we filled. for example if we know that number of filled bytes is divisible by 4 we have to check byte counter for every four bytes. The following routine exploits these assumptions: xor a ld hl, 16384 ;first byte of pixel area ld c, 6 ;6 * (256 * 4) = 6144 loop2 ld b, a ;set B to zero it will cause 256 repeations of loop loop1 ld (hl), a ;set byte to zero inc l ;move to the next byte ld (hl), a inc l ld (hl), a inc l ld (hl), a inc hl ;this time we are not sure that inc l will not cause overflow djnz loop1 ;repeat for next 4 bytes dec c jr nz, loop2 ;outer loop. repeat for next 1024 bytes. We take closer look at the routine. We use BC register to count bytes again but in different way. The usual way would be ld bc, count loop {do something} ;just some operation dec bc ;decrease BC. this instruction does not affect flags ld a, b ;so we have to check condition (BC equals 0) in different way or c ;if B and C are equal to zero then result of (B OR C) is zero too jp nz, loop ;if not zero then repeat This is simple but we use four instructions to decrease BC, check if it equals to zero and jump to the next iteration if it is not. It takes too much time. So, is there an instruction which do these thing at once ? Sadly, no. But there is DJNZ instruction which do very similar thing. It decreases B register and if it does not equals to 0 it will make relative jump. But there is disadvantage. The register B can hold only values from range 0 to 255. What happens if B is equal to zero ? Well, DJNZ decreases value, it overflows and B equals to 255. Then DJNZ checks if DJNZ is zeroed. It has value 255 so relative jump occurs. So for inital value 0 in B register DJNZ will repeat for 256 times. After that it is time to decrease C register and repeat inner loop. You can think about it like we use fictional register CB where C keeps most significant byte and B keeps less significant byte. The second trick we use is that we do not check BC after every ld (hl), a and inc hl instructions. It is because these two instructions takes only little of time and significant amount of time will be spended not by real work but by checking if we already finished work. If we know that number of operations is divisible by some number we can do checks occassionaly. This technique is called Loop unrolling. And the last trick is that we replaced some instructions inc hl by quicker inc l. instruction HL(dec) HL(hex) H(hex) L(hex) 254 $00fe $00 $fe after INC HL 255 $00ff $00 $ff after INC HL 256 $0100 $01 $00 after INC HL 257 $0101 $01 $01 254 $00fe $00 $fe after INC L 255 $00ff $00 $ff after INC L 0 $0000 $00 $00 after INC L 1 $0001 $00 $01 254 $00fe $00 $fe after INC L 255 $00ff $00 $ff after INC HL 256 $0100 $01 $00 after INC L 257 $0101 $01 $01 You can see that for most of times instructions inc l have same effect as inc hl. So we can replace inc hl by inc l for most case but have to secure that when time comes we will use inc hl. As we start with value $4000 in register HL and every fourth increasing of HL is done by inc hl we know that for 256th increasing of HL when overflow of L register would occur we use inc hl to get proper result. So how fast is our routine ? The original routine with LDIR takes cca 129000 T. This routine takes cca 91000 T. Very fast clearing of pixel area Now we show how do the job in very fast way. The question is "What is the fastest way to write one byte ?" Well it is ld (hl), a (or another from ld (hl), r family) instruction and we already used it. So ? If you ask "What is the fastest to write two bytes ?" you do not get answer "Two times ld (hl), a...". Because the fastest way how to write two bytes to memory is PUSH instruction. PUSH instruction is used to store pair register to stack. As stack is in memory as screen is, it is possible to set stack pointer is such way that by pushing values you write to screen. PUSH takes 11 T and it stores two bytes in memory and decreases stack pointer. If you want to do same thing by pair of ld (hl), a and inc l instructions it will take 22T. Obviously, PUSH is two times faster than previous method. As everything in the real world it has some disadvantages. You have to store stack pointer and renew it when you finish work. More important think is that you have to disable interrupt as it uses stack to store return address (and probably saves registers on stack too) so it would lead to crash of whole system. di ;disable interrupt ld (stack + 1), sp ;store current stack pointer ld hl, 0 ;this value will be stored on stack ld sp, 16384 + 6144 ld c, 3 loop2 ld b, l ;set B to 0. it causes that DJNZ will repeat 256 times loop1 push hl ;store hl on stack push hl ;next push hl ;these four push instruction stores 8 bytes on stack push hl djnz loop1 ;repeat for next 8 bytes dec c jr nz, loop2 stack ld sp, 0 ;parameter will overwritten ei In real code you probably unroll loop to such way that you avoid nested loops and use just DJNZ loop. Final words You could ask "What is really used in games ?" Well, it depends. Actually, there is not many games which really clears screen during gameplay. Many of them use back buffer to construct partial portion of screen and then copy this back buffer to screen. If there is need to clear such buffer usually push method is used. But not always. For example Knight Lore uses very simple routine even slower than everything in this section. Still the game has very nice presentation. Other games use clever methods to update just portion screen without clearing or redrawing of whole playing area. If you are interested in this check Drawing strategy page. But every game uses other screens like menu, higscore, options. For transitions between these screen typically LDIR method is used. There is no hurry and it still makes its job in less than 0.04 second. That is it.